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Module 1 : Introduction to Big Data Analytics 

The Data Scientist 


During this part the following topics are covered: 

• Key Roles of the New Big Data Ecosystem 

• Profile of a Data Scientist 
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Skills Needed In the New Data Ecosystem 

Your Thoughts? 

• What new skill sets do you need to take advantage of the big 
data sets in the loan processing improvement case study? 

• Do most large organizations have people with these skill sets? 

• If so, who are they? 
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Three Key Roles of the New Data Ecosystem 



Role 


Role Description 

Data 



People with advanced training in 
quantitative disciplines, such as 
mathematics, statistics, and machine 

Scientists 

f N 

Deep Analytica 
Talent 

^ J 





learning. 

Analysts & 

Data Savvy 
Managers 

Data Savvy 
Professionals 

People with a basic knowledge of 
statistics and/or machine learning, who 
can define key questions that can be 
answered using advanced analytics 

Technology & Data 
Enablers 

People providing technical expertise to 
support analytical projects. Skills sets 
including computer programming and 
database administration 


Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 201 1 article Big Data: The next frontier for innovation, 
competition, and productivity 
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Data Scientist Key Activities 


Reframe business 
challenges as analytics 
challenges 

Design, implement and 
deploy statistical models 
and data mining 
techniques on big data 

Create insights that lead to 

actionable 

recommendations 


! Data Scientists 



Data Data Bl 

Engineers Analyst Analyst User 


Analytic Productivity Platform 



Tools & Services 


f 

Infrastructure 

\ 

V 


J 


Data 

Platform 

Admin 
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Profile of a Data Scientist 



Quantitative 


Curious & 
Creative 


Technical 


DATA SCIENTIST SUMMIT E3D 


Communicative 
& Collaborative 


Skeptical 
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Module 1 : Introduction to Big Data Analytics 


Summary 

During this part the following topics were covered: 

• Key Roles of the New Big Data Ecosystem 

• Profile of a Data Scientist 
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Module 1 : Introduction to Big Data Analytics 

Big Data Analytics in Industry Verticals 

During this part we cover the following representative examples: 

• Health Care 

• Public Services 

• Life Sciences 

• IT Infrastructure 

• Online Services 
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Big Data Analytics: Industry Examples 


O Health Care 

• Reducing Cost of Care 

e Public Services 

•Preventing Pandemics 

O Life Sciences 

•Genomic Mapping 

©IT Infrastructure 

•Unstructured Data Analysis 

© Online Services 

•Social Media for Professionals 
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O Big Data Analytics: Healthcare 


• Poor police response and problems with medical care, triggered 
Situation by shooting of a student 

•The event drove local doctor to map crime data and examine 
local health care 

• Dr. Jeffrey Brenner generated his own crime maps from medical 
Use of Big Data billing records of 3 hospitals 


• City hospitals provided expensive care, low quality care 

• Reduced hospital costs by 56% by realizing that 80% of city’s 

Key medical costs came from 13% of its residents, mainly low- 

Outcomes income or elderly 

• Now offers preventative care over the phone or through home 
visits 




Copyright © 2014 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 11 


® Big Data Analytics: Public Services 




•Threat of global pandemics has increased exponentially 

• Pandemics spreads at faster rates, more resistant to antibiotics 

•Created a network of viral listening posts 

•Combines data from viral discovery in the field, research in 
disease hotspots, and social media trends 

• Using Big Data to make accurate predications on spread of new 
pandemics 

• Identified a fifth form of human malaria, including its origin 



• Identified why efforts failed to control swine flu 


• Proposing more proactive approaches to preventing outbreaks 
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O Big Data Analytics: Life Sciences 




•Broad Institute (MIT & Harvard) mapping the Human Genome 


Use of Big Data 


• In 13 yrs, mapped 3 billion genetic base pairs; 8 petabytes 

• Developed 30+ software packages, now shared publicly, along 
with the genomic data 



• Using genetic mappings to identify cellular mutations causing 
cancer and other serious diseases 

• Innovating how genomic research informs new pharmaceutical 
drugs 
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Big Data Analytics: IT Infrastructure 







• Explosion of unstructured data required new technology to 
analyze quickly, and efficiently 


• Doug Cutting created Hadoop to divide large processing tasks 
into smaller tasks across many computers 

Use of Big Data 

•Analyzes social media data generated by hundreds of 
thousands of users 


Situation 


Key 

Outcomes 


• New York Times used Hadoop to transform its entire public 
archive, from 1851 to 1922, into 11 million PDF files in 24 hrs. 

•Applications range from social media, sentiment analysis, 
wartime chatter, natural language processing 
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Big Data Analytics: Online Services 



•Opportunity to create social media space for professionals 


•Collects and analyzes data from over 100 million users 

Use of Big Data 

1 -Adding 1 million new users per week 


Situation 


Key 

Outcomes 


•Linkedln Skills, InMaps, Job Recommendations, Recruiting 

• Established a diverse data scientist group, as founder believes 
this is the start of Big Data revolution 
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Module 1 : Introduction to Big Data Analytics 


Summary 

During this part the following representative examples were covered: 

• Health Care 

• Public Services 

• Life Sciences 

• IT Infrastructure 

• Online Services 
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Check Your Knowledge 


1. What are the 3 characteristics of Big Data, and the 

Your Thoughts? 

main considerations in processing Big Data? 

2 . What is an analytic sandbox - Data EcoSystem? 

3 . Explain the difference between Business Intelligence 
and Data Science. 

4 . Describe the challenges of the current analytical 
architecture for Data Scientists. 

5 . What are the key skill sets and behavioral characteristics 
of a Data Scientist? 
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Module 1: Summary 


Key points covered in this module: 

• Big data was defined 

• Four business drivers for advanced analytics were identified 

• The techniques for Business Intelligence were distinguished from 
those of Data Science 

• The role of the Data Scientist within the new big data ecosystem 
was described 

• Multiple illustrative examples of big data opportunities were 
cited 
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Adv. Methods 



Thanks 
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